Fano's inequality

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for minimax risks in density estimation.

Contents

Fano's inequality

Let the random variables X and Y represent input and output messages (out of r+1 possible messages) with a joint probability P(x,y). Fano's inequality is

H(X|Y)\leq H(e)%2BP(e)\log(r),

where

H\left(X|Y\right)=-\sum_{i,j} P(x_i,y_j)\log P\left(x_i|y_j\right)

is the conditional entropy,

P(e)=\sup_i \sum_{j\not=i} P(y_j|x_i)

is the probability of the communication error, and

H(e)=-P(e)\log P(e)-(1-P(e))\log(1-P(e))

is the corresponding binary entropy.

Alternative formulation

Let X be a random variable with density equal to one of r%2B1 possible densities f_1,\ldots,f_{r%2B1}. Furthermore, the Kullback-Leibler divergence between any pair of densities cannot be too large,

 D_{KL}(f_i\|f_j)\leq \beta for all i\not = j.

Let \psi(X)\in\{1,\ldots, r%2B1\} be an estimate of the index. Then

\sup_i P_i(\psi(X)\not = i) \geq 1-\frac{\beta%2B\log 2}{\log r}

where P_i is the probability induced by f_i

Generalization

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of r + 1 densities ƒθ such that for any θ ≠ θ

\|f_{\theta}-f_{\theta'}\|_{L_1}\geq \alpha,\,
D_{KL}(f_\theta\|f_{\theta'})\leq \beta.\,

Then in the worst case the expected value of error of estimation is bound from below,

\sup_{f\in \mathbf{F}} E \|f_n-f\|_{L_1}\geq \frac{\alpha}{2}\left(1-\frac{n\beta%2B\log 2}{\log r}\right)

where ƒn is any density estimator based on a sample of size n.

References